Skip to main content

Worldwide Web Crawls

Wide crawls of the Internet conducted by Internet Archive. Please visit the Wayback Machine to explore archived web sites.



rss RSS

PART OF
Internet Archive Web Crawls
SHOW DETAILS
up-solid down-solid
eye
Title
Date Archived
Creator
collection
eye 1.5B
The seed for Wide00014 was: - Slash pages from every domain on the web: -- a list of domains using Survey crawl seeds -- a list of domains using Wide00012 web graph -- a list of domains using Wide00013 web graph - Top ranked pages (up to a max of 100) from every linked-to domain using the Wide00012 inter-domain navigational link graph -- a ranking of all URLs that have more than one incoming inter-domain link (rank was determined by number of incoming links using Wide00012 inter domain links)...
Wide Crawl started April 2013
collection
25,035
ITEMS
976.2M
VIEWS
collection
eye 976.2M
Web wide crawl with initial seedlist and crawler configuration from April 2013.
collection
eye 875.5M
Web wide crawl.
Wide Crawl Number 12 - started March, 14th 2015
collection
49,621
ITEMS
870M
VIEWS
collection
eye 870M
Web wide crawl with initial seedlist and crawler configuration from January 2015.
collection
eye 866.4M
Wide17 was seeded with the "Total Domains" list of 256,796,456 URLs provided by  Domains Index   on June 26th, and crawled with max-hops set to "3" and de-duplication set "on".   
Wide Crawl started June 2014
collection
45,341
ITEMS
853.8M
VIEWS
collection
eye 853.8M
Web wide crawl with initial seedlist and crawler configuration from June 2014.
collection
eye 707.5M
Web wide crawl number 16 The seed list for Wide00016 was made from the join of the top 1 million domains from CISCO and the top 1 million domains from Alexa.
Wide Crawl started August 2013
collection
21,932
ITEMS
637.2M
VIEWS
collection
eye 637.2M
Web wide crawl with initial seedlist and crawler configuration from August 2013.
Wide Crawl Number 13
collection
46,050
ITEMS
625.5M
VIEWS
collection
eye 625.5M
Web Wide Crawl Number 13
Wide Crawl started January 2012
collection
30,373
ITEMS
561.6M
VIEWS
collection
eye 561.6M
Web wide crawl with initial seedlist and crawler configuration from January 2012 using HQ software.
Wide Crawl started April 2012
collection
39,279
ITEMS
491.3M
VIEWS
collection
eye 491.3M
Web wide crawl with initial seedlist and crawler configuration from April 2012.
Wide Crawl started February 2014
collection
9,806
ITEMS
383.8M
VIEWS
collection
eye 383.8M
Web wide crawl with initial seedlist and crawler configuration from February 2014.
Wide Crawl started October 2010
collection
15,839
ITEMS
371M
VIEWS
collection
eye 371M
Web wide crawl with initial seedlist and crawler configuration from October 2010
Wide Crawl Started January 2013
collection
15,157
ITEMS
361.8M
VIEWS
collection
eye 361.8M
Wide crawls of the Internet conducted by Internet Archive. Access to content is restricted. Please visit the Wayback Machine to explore archived web sites.
Wide Crawl started September 2012
collection
22,423
ITEMS
357.9M
VIEWS
collection
eye 357.9M
Web wide crawl with initial seedlist and crawler configuration from September 2012.
Wide Crawl started October 2011
collection
12,648
ITEMS
334.4M
VIEWS
collection
eye 334.4M
Web wide crawl with initial seedlist and crawler configuration from March 2011 using HQ software.
Wide Crawl started March 2011
collection
8,528
ITEMS
313.1M
VIEWS
collection
eye 313.1M
Web wide crawl with initial seedlist and crawler configuration from March 2011. This uses the new HQ software for distributed crawling by Kenji Nagahashi. What’s in the data set: Crawl start date: 09 March, 2011 Crawl end date: 23 December, 2011 Number of captures: 2,713,676,341 Number of unique URLs: 2,273,840,159 Number of hosts: 29,032,069 The seed list for this crawl was a list of Alexa’s top 1 million web sites, retrieved close to the crawl start date. We used Heritrix (3.1.1-SNAPSHOT)...
Host Screen Captures
collection
17,291
ITEMS
50.3M
VIEWS
collection
eye 50.3M
Screen captures of hosts discovered during wide crawls. This data is currently not publicly accessible.
Wide Crawl started September 2010
collection
332
ITEMS
10.7M
VIEWS
collection
eye 10.7M
Web wide crawl with initial seedlist and crawler configuration from September 2010
Internet Archive crawldata from Webwide Crawl, captured by crawl421.us.archive.org:wide from Mon Feb 12 21:42:38 PST 2018 to Mon Feb 12 15:20:34 PST 2018.
Topic: crawldata
Wide Crawl started January 2012
web
eye 5.9M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl423.us.archive.org:wide from Tue Jan 17 08:02:53 PST 2012 to Tue Jan 17 01:16:20 PST 2012.
Topic: crawldata
Wide Crawl started February 2014
web
eye 5.7M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl453.us.archive.org:wide from Wed Feb 19 01:09:37 PST 2014 to Tue Feb 18 21:33:27 PST 2014.
Topic: crawldata
Wide Crawl started February 2014
web
eye 5.7M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl454.us.archive.org:wide from Wed Feb 19 05:20:19 PST 2014 to Wed Feb 19 01:54:33 PST 2014.
Topic: crawldata
Wide Crawl started February 2014
web
eye 5.6M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl420.us.archive.org:wide from Tue Feb 18 17:01:58 PST 2014 to Tue Feb 18 13:14:06 PST 2014.
Topic: crawldata
Wide Crawl started February 2014
web
eye 5.6M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl427.us.archive.org:wide from Wed Feb 19 09:49:01 PST 2014 to Wed Feb 19 06:07:15 PST 2014.
Topic: crawldata
Wide Crawl started February 2014
web
eye 5.5M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Wed Feb 19 07:58:38 PST 2014 to Wed Feb 19 05:13:46 PST 2014.
Topic: crawldata
Wide Crawl started February 2014
web
eye 4.1M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl429.us.archive.org:wide from Wed Feb 19 08:18:23 PST 2014 to Wed Feb 19 04:21:37 PST 2014.
Topic: crawldata
Wide Crawl started February 2014
web
eye 4.1M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl429.us.archive.org:wide from Tue Feb 18 22:58:46 PST 2014 to Tue Feb 18 19:25:19 PST 2014.
Topic: crawldata
Wide Crawl started June 2014
web
eye 3.5M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl421.us.archive.org:wide from Thu Jul 10 06:43:41 PDT 2014 to Thu Jul 10 01:23:01 PDT 2014.
Topic: crawldata
Wide Crawl started June 2014
web
eye 3.5M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl429.us.archive.org:wide from Thu Jul 10 07:24:15 PDT 2014 to Thu Jul 10 01:45:52 PDT 2014.
Topic: crawldata
Wide Crawl Number 12 - started March, 14th 2015
web
eye 2.8M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl429.us.archive.org:wide from Sat Mar 14 23:38:56 PDT 2015 to Sat Mar 14 17:31:22 PDT 2015.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl805.us.archive.org:wide from Mon May 13 17:55:38 PDT 2019 to Mon May 13 13:45:55 PDT 2019.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl422.us.archive.org:wide from Wed Jan 4 01:00:14 PST 2017 to Tue Jan 3 19:50:56 PST 2017.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl802.us.archive.org:wide from Sun Aug 18 23:24:01 PDT 2019 to Mon Aug 19 10:00:17 PDT 2019.
Topic: crawldata
Wide Crawl started February 2014
web
eye 2M
favorite 1
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl416.us.archive.org:wide from Sat Feb 8 03:46:42 PST 2014 to Fri Feb 7 23:17:16 PST 2014.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl428.us.archive.org:wide from Tue Jun 13 00:55:34 PDT 2017 to Mon Jun 12 19:36:27 PDT 2017.
Topic: crawldata
Wide Crawl started February 2014
web
eye 1.9M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl414.us.archive.org:wide from Sat Feb 8 04:46:28 PST 2014 to Sat Feb 8 00:01:23 PST 2014.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Fri Aug 3 21:18:13 PDT 2018 to Sun Aug 5 13:22:13 PDT 2018.
Topic: crawldata
Wide Crawl started January 2012
web
eye 1.7M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Sat Jan 21 04:01:50 PST 2012 to Fri Jan 20 21:01:34 PST 2012.
Topic: crawldata
Wide Crawl Number 12 - started March, 14th 2015
web
eye 1.5M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl421.us.archive.org:wide from Sat Mar 14 20:34:37 PDT 2015 to Sat Mar 14 14:59:03 PDT 2015.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl809.us.archive.org:wide from Mon Aug 1 22:32:39 PDT 2016 to Mon Aug 1 17:39:28 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Fri Mar 25 18:07:39 PDT 2016 to Fri Mar 25 13:34:00 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl423.us.archive.org:wide from Tue Jun 6 06:46:26 PDT 2017 to Tue Jun 6 01:11:01 PDT 2017.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl809.us.archive.org:wide from Tue Jun 6 00:51:50 PDT 2017 to Mon Jun 5 19:34:08 PDT 2017.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl801.us.archive.org:wide from Tue Jun 6 07:52:14 PDT 2017 to Tue Jun 6 03:02:49 PDT 2017.
Topic: crawldata
Wide Crawl started April 2013
web
eye 1.4M
favorite 1
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl415.us.archive.org:wide from Wed May 15 12:25:51 PDT 2013 to Wed May 15 06:56:55 PDT 2013.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl428.us.archive.org:wide from Thu Jul 25 04:17:23 PDT 2019 to Wed Jul 24 23:16:06 PDT 2019.
Topic: crawldata
Wide Crawl started February 2014
web
eye 1.2M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl338.us.archive.org:wide from Sat Feb 22 06:42:19 PST 2014 to Sat Feb 22 01:03:35 PST 2014.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl424.us.archive.org:wide from Tue Aug 16 09:06:32 PDT 2016 to Tue Aug 16 03:02:57 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl421.us.archive.org:wide from Sat Jun 3 01:20:30 PDT 2017 to Sat Jun 3 14:17:04 PDT 2017.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl802.us.archive.org:wide from Sun Aug 5 00:35:40 PDT 2018 to Tue Aug 7 01:25:09 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl813.us.archive.org:wide from Tue Jun 6 23:57:11 PDT 2017 to Tue Jun 6 17:57:05 PDT 2017.
Topic: crawldata
Wide Crawl started June 2014
web
eye 1M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl424.us.archive.org:wide from Tue Jul 1 13:59:42 PDT 2014 to Tue Jul 1 08:25:02 PDT 2014.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl808.us.archive.org:wide from Wed Jun 7 01:07:21 PDT 2017 to Tue Jun 6 19:06:13 PDT 2017.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl422.us.archive.org:wide from Wed Jun 7 00:20:30 PDT 2017 to Tue Jun 6 18:18:45 PDT 2017.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl806.us.archive.org:wide from Mon Aug 6 06:15:02 PDT 2018 to Wed Aug 8 17:46:21 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl802.us.archive.org:wide from Mon Aug 6 09:28:15 PDT 2018 to Wed Aug 8 18:02:43 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl808.us.archive.org:wide from Tue Jun 6 23:48:51 PDT 2017 to Tue Jun 6 17:43:39 PDT 2017.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl812.us.archive.org:wide from Sun Aug 5 05:36:11 PDT 2018 to Mon Aug 6 06:38:33 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl422.us.archive.org:wide from Sat Aug 4 20:34:59 PDT 2018 to Mon Aug 6 22:41:00 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl809.us.archive.org:wide from Sat Oct 22 11:25:13 PDT 2016 to Sat Oct 22 14:28:42 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl421.us.archive.org:wide from Sun Aug 5 04:29:12 PDT 2018 to Mon Aug 6 09:39:30 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl428.us.archive.org:wide from Sun Aug 5 05:35:10 PDT 2018 to Mon Aug 6 17:04:32 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl425.us.archive.org:wide from Sun Aug 5 03:16:34 PDT 2018 to Mon Aug 6 16:55:41 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl811.us.archive.org:wide from Sun Aug 5 01:02:39 PDT 2018 to Tue Aug 7 17:04:42 PDT 2018.
Topic: crawldata
Wide Crawl started February 2014
web
eye 1M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl417.us.archive.org:wide from Sat Feb 22 09:02:42 PST 2014 to Sat Feb 22 06:09:28 PST 2014.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl429.us.archive.org:wide from Sun Aug 5 05:22:26 PDT 2018 to Mon Aug 6 22:26:16 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl812.us.archive.org:wide from Mon Aug 6 13:38:33 PDT 2018 to Wed Aug 8 08:00:38 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl428.us.archive.org:wide from Mon Aug 6 12:03:20 PDT 2018 to Wed Aug 8 23:51:48 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl422.us.archive.org:wide from Mon Aug 6 06:15:56 PDT 2018 to Wed Aug 8 04:21:46 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl800.us.archive.org:wide from Sun Aug 5 03:33:48 PDT 2018 to Tue Aug 7 01:48:53 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Sun Aug 5 05:32:56 PDT 2018 to Tue Aug 7 09:24:53 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl424.us.archive.org:wide from Mon Aug 6 13:48:48 PDT 2018 to Wed Aug 8 11:44:48 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl804.us.archive.org:wide from Sun Aug 5 04:04:57 PDT 2018 to Tue Aug 7 13:12:57 PDT 2018.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl813.us.archive.org:wide from Sun Aug 5 06:35:49 PDT 2018 to Mon Aug 6 18:20:05 PDT 2018.
Topic: crawldata